首页> 外文OA文献 >Accelerating finite-rate chemical kinetics with coprocessors: comparing vectorization methods on GPUs, MICs, and CPUs
【2h】

Accelerating finite-rate chemical kinetics with coprocessors: comparing vectorization methods on GPUs, MICs, and CPUs

机译:用协处理器加速有限速率化学动力学:比较   GpU,mIC和CpU上的矢量化方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Efficient ordinary differential equation solvers for chemical kinetics musttake into account the available thread and instruction-level parallelism of theunderlying hardware, especially on many-core coprocessors, as well as thenumerical efficiency. A stiff Rosenbrock and nonstiff Runge-Kutta solver areimplemented using the single instruction, multiple thread (SIMT) and singleinstruction, multiple data (SIMD) paradigms with OpenCL. The performances ofthese parallel implementations were measured with three chemical kinetic modelsacross several multicore and many-core platforms. Two runtime benchmarks wereconducted to clearly determine any performance advantage offered by eithermethod: evaluating the right-hand-side source terms in parallel, andintegrating a series of constant-pressure homogeneous reactors using theRosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMDparallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processorperformed approximately three times faster than the baseline multithreadedcode. The SIMT model on the host and Phi was 13-35% slower than the baselinewhile the SIMT model on the GPU provided approximately the same performance asthe SIMD model on the Phi. The runtimes for both ODE solvers decreased 2.5-2.7xwith the SIMD implementations on the host CPU and 4.7-4.9x with the Xeon Phicoprocessor compared to the baseline parallel code. The SIMT implementations onthe GPU ran 1.4-1.6 times faster than the baseline multithreaded CPU code;however, this was significantly slower than the SIMD versions on the host CPUor the Xeon Phi. The performance difference between the three platforms wasattributed to thread divergence caused by the adaptive step-sizes within theODE integrators. Analysis showed that the wider vector width of the GPU incursa higher level of divergence than the narrower Sandy Bridge or Xeon Phi.
机译:对于化学动力学而言,有效的普通微分方程求解器必须考虑到底层硬件(特别是在多核协处理器上)的可用线程和指令级并行性以及数值效率。使用OpenCL的单指令多线程(SIMT)和单指令多数据(SIMD)范例实现了刚性Rosenbrock和非刚性Runge-Kutta解算器。这些并行实现的性能是通过跨多个多核和多核平台的三个化学动力学模型测得的。进行了两个运行时基准测试,以明确确定这两种方法所提供的任何性能优势:并行评估右侧源条件,以及使用Rosenbrock和Runge-Kutta求解器集成一系列恒压均相反应器。在主机多核Xeon CPU和多核Xeon Phi协处理器上使用SIMDparallelism进行右侧评估的速度比基准多线程代码快约三倍。主机和Phi上的SIMT模型比基线慢13-35%,而GPU上的SIMT模型提供的性能与Phi上的SIMD模型大致相同。与基线并行代码相比,使用主机CPU上的SIMD实现时,两个ODE求解器的运行时间都减少了2.5-2.7倍,使用Xeon Phicoprocessor时则减少了4.7-4.9倍。 GPU上的SIMT实现比基线多线程CPU代码快1.4-1.6倍;但是,这比主机CPU或至强融核上的SIMD版本慢得多。这三个平台之间的性能差异归因于ODE集成器内自适应步长导致的线程分歧。分析表明,较窄的Sandy Bridge或Xeon Phi,GPU的矢量宽度较宽,导致较高的发散度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号